VLSI Architecture of GMM Processing and Viterbi Decoder for 60, 000-Word Real-Time Continuous Speech Recognition

نویسندگان

  • Hiroki Noguchi
  • Kazuo Miura
  • Tsuyoshi Fujinaga
  • Takanobu Sugahara
  • Hiroshi Kawaguchi
  • Masahiko Yoshimoto
چکیده

We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition. key words: speech recognition, hidden Markov model (HMM), VLSI architecture

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A 40-NM 54-MW 3×-real-time VLSI processor for 60-kWord continuous speech recognition

This paper describes a low-power VLSI chip for speakerindependent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). We implement parallel and pipelined architecture for GMM computation and Viterbi processing. It includes a 8-path Viterbi transition architecture to maximize the processing speed and adopts tri-gram language model to improve the recogni...

متن کامل

A 54-mw 3×-real-time 60-kword continuous speech recognition processor VLSI

This paper describes a low-power VLSI chip for speakerindependent 60-kWord continuous speech recognition. We implement parallel and pipelined architecture for GMM computation and Viterbi processing. It includes a 8-path Viterbi transition architecture to maximize the processing speed and adopts tri-gram language model to improve the recognition accuracy. A two-level cache architecture is implem...

متن کامل

A 40-nm 168-mW 2.4×-real-time VLSI processor for 60-kWord continuous speech recognition

This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). Our implementation includes a compression–decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-ap...

متن کامل

A 168-mW 2.4X-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

This paper describes a low-power VLSI chip for speakerindependent 60-kWord continuous speech recognition based on a contextdependent Hidden Markov Model (HMM). It features a compressiondecoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM cal...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 94-C  شماره 

صفحات  -

تاریخ انتشار 2011